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Abstract 



This analysis exemplifies a method for Investigating the construct 
validity of the sul^cales of one or more test batteries. In this study, 
the structures of the Comparative Guidance and Placement Test (CGF) and 
New Jersey Basic Skills test verp: examined to see if the subscales 
designed to measure the same skills vere doing so^ and to see if the 
nontraditional subscales (Mosaic Comparisons^ Year 2000^ and Letter 
Groups) vere measuring something different from the traditional sub- 
scales (Reading, Sentences, and Mathematics) ■ 

The methodology employed was a confirmatory factor analysis. Data 
from 822 students who had taken both batteries were used to test a 
hypothesized four^f actor model (Reading i Sentences^ Mathematics^ and 
Mosaic Comparisons); this model was found to fit the data. It was con- 
cluded that Mosaic Comparisons, Year 2000, and Letter Groups each measure 
something unlqueiy different from Reading, Sentences, and Mathematics. 

Although technically complex, this methodology is easily and 
Inexpensively applied to this type of problem. It can be particularly 
useful in criterion-referenced test development for testing whether a priori 
subscales are actually measuring different skills. 
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Introdaction 

The Comparative Goidance and Plac^ent (CGP) test battery is designed 
to measure skills in reading, sentence structure, and mathematics plus; the 
skills required to do three subtests entitled Mosaic Comparisons, Letter 
Groups, and Year 2000. A question had arisen as to whether the three non- 
tradltlanal subtests were actually measuring sdmethihg different from the 
standard verbal and mathematics tests. A further question was also raised 
In connection with the New Jersey Lasic Skills test. This test^ also 
produced by ETS, has subtests for reading, sentences^ and mathematics. 
Do these subtests measure the same skills as the CGP subtests bearing 
similar names? 

These questions deal with complementary aspects of construct validity, 
namely, convergent and discriminant validity (see Cronbach, 1971, and 
Campbell and Fiske, 1959). A test has convergent validity if it measures 
what it purports to measure. It has discriminant validity if the skill 
it measures is distinctly different from other skills. 

Recent developments in maximxxm likelihood confirmatory factor analysis 
enable the researcher to test, statistically, the goodness-of-f it of 
a priori construct validation models to empirical data. (See Werts S Linn^ 
1970, for the application of path analysis techniques to the multitrait- 
multimethod matrix; see Rock & Werts^ 1979^ for a recent, example.) When 
the fit of a given model is not rejected, it can be concluded that some 
evidence of the construct validity of the measures has been found and that 
the underlying theory of the interrelationships of the variables has been, 
to some extent, corroborated. When the relationships involving true scores 
are modeled with confirmatory factor structures, a number of psychometric 
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parameters can be estimated. As JBreskog (1971) pdirits out, for the 
scores that are congeneric (i.e., measures of the same factor), the 
maximtmi likelihood estimates of the factor loadings are the regression of 
the observed scores on their "true" scores. Squared standardized 
factor loadings correspond to estimates of the reliability with which 
each instrument measures each skill (construct or factor). The 
correlation between factors corresponds to the correlation between 
variables corrected for attenuation, i.e., the correlation between true 
scores (sees Werts & Linn, 1972). 

Data 

When students take the CGP, they c^lete four separately timed 
reading sections, four sections of sentences, three sections of Mosaic 
Comparisons, one section entitled tetter Groups^ and one section called 
Year 2000. For the mathematics tests^ they are Instructed to take math 
level C if they have had no algebra in high school. Level C consists of 
two sections; one is coiputatlon and the other consists of arithmetic 
reasoning. Studenti who have bad one year of algebra in high school are 
told to take level D. This cbhsists of the same craputation test included 
in ievel C and an elementary algebra test. Level E, which is taken by those 
who have had two years of high school algebra, consists of the same elementary 
algebra test plus an intermediate algebra test. Thus all students take two 
mathematics tests. 

In the New Jersey Basic Skills Test, there are three reading subtests, 
three sentence subtests, one computation test* and one elementary algebra 
testi Ail of these are taken by all students. 

Data from 822 students who had t*en both the CGP and the New Jersey 
Basic Skills Tests were used for analysis. Scores were divided into three 
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groaps corresponding to the leyel of CGP Mathematics taken. Level d 
consisted of 184 students i level D had 282 students, and level E consisted 
of 356 students. 

Method 

A model was developed in which Reading, Sentences, Mosaic Cbraparisons, 
Math Computation* and Elementary Algebra were hypothesized as five con- 
structs (factors) underlying the scores being studied. The four Reading 
scores on the CGP and three Reading subscores of the New Jersey test were 
hypothesized to load on a single "Reading" factor. This factor would be 
interpreted as "true" ^.eading skill. Likewise ^ the four CGP sferitences 
subscores and the three New Jersey Sentences subscores were hypothesized 
to fit a single factor labeled "Sentences." 

Because it was not certain whether Math Computation arid Elementary 
Algebra would fit a single math factor* and because there was no a priori 
necessity that they do so, two different math factors were hypothesized, 
ene measure of each factor, Computation and Elementary Algebra, arose from 
each test battery. 

The three sections of Mosaic Comparisons were hypothesized to form a 
correlated but distinctly different factor from the other four. 

Since there was only one score for Year 2000 and one for Letter 
Groups, these measures could not be treated as separate factors. Instead, 
they were perinitted to load on all other factors. By permitting them to 
do so, estimates could then be made of the degree to which scores on those 
subtests could be explained by each of the factors (i.e., each of the 
traditional subject areas). 
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The COFAMM program (SHrbom & JBreskog, 1976) was used to test the fit 

of this model to the data, simultaneously for each of the three subgroups 

of students, as defined by the three math subtest levels. COFAMM assumes 

that a factor analysis iiK^dei holds in each of the g populations under 

study, if X is defined as the vector of the p observed measures in group g, 

then TC can be accounted for by k common factors (f ) and p unique factors 
••g -"g 

(z_). The model in each population is: 

x=v+Af+z 

-8 -8 -g-g -8 (1) 

\^ere v is a p x 1 vector of location parameters and A a p x k matrix of 

-g 

factor loadings. it is assumed that 2_ and f_ are uncorrelated, the expecta- 

■"g ••g 

tion of z_ = 0 and. -the expectation of f _ = G where 0 Is a k x 1 parameter 
vector. 

Given these assumptions, the mean vector u of the x is 

■"8 ""8 

U = V + A e (2) 
^8 -8 -8-8 

and the expected variance-cbvariance matrix of x- is 

^8 

E = A (() A • + f- (3) 
-8 -g-g-g ^g 

where $ is the variance-covariance matrix of the f^ and is 
Ig ^8 -8 

the variance-covariance matrix of When the factor model 

-8 

does not fit the data perfectly, the observed variance-cbvariance 

matrices S and observed means will differ from the maximum 
-8 

likelihood estimates of Z and p The program yields a chi-square 

-8 -8- 

statistic that is a measure of these differences, that is, of how 

well the hypothesized model fits the data conqpared to the hull hypothesis 

that the variance -covarlance matrix of z- may have any structure whatsoever. 

**• 8 
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The four matrices 8 , A , * , and f are called the pattern 
-»8 "'8 "•S "^S 

matrices. The elements of these matrices are the model parameters 
which are of three kinds: (a) fixed parameters ^ which have been 
assigned given values, like 0 or 1; (b) constrained parameters, 
which are unknown but equal to one or more other parameters; and 
(c) free parameters, which are unknown and not constrained to be 
equal to any other parameter. A parameter may be constrained to 
be equal to other parameters in the same and/or different pattern 
matrices in the same and/or in different groups. 

The important feature of a confirmatory analysis is that the 
parameters of the model may be uniquely estimated, i.e., the model is 
Identified subject to an algebraic constraint. According to this con- 
straint^ a solution is unique if all linear transformations of the 
factors that leave the fixed parameters unchanged also leave the free 
parameters unchanged. It is difficult in general to give useful con- 
ditions which are sufficient for identification. However^ at one point 
in the program the information matrix for the unknown parameters is 
computed. If this matrix is positive definite^ it is almost certain 
that the model is identified. If this matrix: is not positive definite, 
the program prints a message tr> this effect ^ specifying which parameter 
is probably not identified. 

In this study i the model is bveridentif led , yielding not only unique 
solutions but sufficient degrees of freedom for statistical tests of 
goodness-of-fit< In addition, standard errors for ail the unknown 
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parameter estimates are also provided by the program. This analysis 
differs from an exploratory factor analysis, in an exploratory analysis, 
the model is usually not identified, and thus, there is neither a statis- 
tical test of goodness-of-f it nor is there a unique solution. 



The factor pattern of the initial model to be tested was a special 
case of equation (1), with 23 variables and 5 hypothesized factors. These 
factors were Reading, Sentences, Gomputatloh, Algebra, and Mosaic Compar- 
isons. The structure being tested was hypothesized to be the same across 
all three groups of subjects. The statistical test of this model using 
eOFAiSl yielded a chl-square of 994.56 with 720 degrees of freedom. Con- 
siderable difficulty was encountered in finding a sblutibn for this model, 
however i because of a high cblihearity between the two math factors 
(r » .995). This is because regression weights on highly correlated 
independent variables have large standard errors (see Farrar & Glauber, 1967) 

The model was, therefore, reduced to four factors with Computation 
arid Elementary Algebra scores being permitted to load on a single 
••Mathematics" factor. When tested, this iMdel yielded a chl-square of 
1020.5 with 721 degrees of freedom md m RMS of the residuals of ,07. 

Because of the large ssmpie size, evert the most trivial deviations 
from the model would tend to yield a statistically significant chi-square 
value. This chi-square, however, is relatively small compared to the degrees 
of freedom. A more appropriate measure of goodness-of-fit is the root mean 
square (RMS) of the residuals. This is the square root of the average 
squared difference between corresponding elements in each pdpulatidri's 
observed variance-covariance matrix and the reproduced variance-covariance 
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matrix cohditldhal bri the cbristraihed factor model. Ordinarily^ it would 
hot be easy to interpret these indices in the case of variance-cbvariahce 
matrices^ therefore, the brigihal variance-cbvariahce matrices were 
rescaled to cbrrelatibh matrices (e,g.^ see Sorbbm & Joreskbg, 1976). Thus, 
the RMS may be interpreted much as ybu would interpret the residuals when 
fitting a factor model to the observed cbrrelatibh matrix. 

the HMS of .07 found here is satiisfactdry considering the sample 
sizes and differences between the three mathematics ability grbups. 
Table 1 shows the standardized factor loadings arid percentage of variance 
in each observed score that can be explained by each factor. Table 2 shows 
the intercorrelation of the four factors. Standard errors for these 
estimates were about .65 for all factor loadings with the exception of 
those for Year 2000 and Letter Grbups which had standard errors of .06 
fbr their Ibadings on the Reading and Sentences factors. Standard errors 
were .04 ih the correlations between factors ^ except fbr the correlation, 
between Readihg and Sehtences which had a stahdard errbr bf bnly .02. 

A diagram of the model is shown ih Figure 1 with standardized 
factor loadings indicated. To simplify the drawihg^ ihtercbrrelatibns 
of factors are shown separately in Figure 2. The circles represeht 
factors (constructs or true scores) , and arrows from the factors are 
shown to indicate that those true scores underlie the observed scores to 
which they point. The small arrows indicate the measurement error 
component of the observed score. These figures merely show diagram- 

matically the same information contained in the tables. 

The factor intercorfelatibns can be interpreted as correlations 

tbetween true scores~l.e., correlations corrected for attenuation. 

Hbsaic Cbmparisohs^ whch cbrrected fbr attehuatioh^ cbrrelate bnly 

slightly with Readiiig and Sehtehce skills (.20 ahd .26) but somewhat mbre 

is 
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moderately with Mathematics (.37). These results suggest that although 
there is some relationship between the abilities required to do Mosaic 
Comparisons and the abilities required to do traditional verbal and math 
tests, there are other unique skills required to do Mosaic Comparisons. 

The standardized factor loadings obtained for each subtest score are 
most useful when squared and interpreted as percentages of variance 
explained by the underlying factor. Because Year 2000 and Letter Groups 
loaded on all four Ihtefcorrelated factors, their factor loadings are partial 
correlations. It can be seen that very little variance in either score is ex- 
plained by Reading or the skill underlying Mosaic Comparisons. For Year 2000, 12! 
of the variance is explained by the Sentences factor and 8% by Math 
skill. The cbmbihatioh of all four factors, however, accounts for 54% of 
the variance in Year 2000. (This number is obtained by subtracting the 
unknown unique variance from unity rather than by adding the squared 
standardized factor loadings because, for Year 2000 and Letter Groups 
these are partial factor loadings.) 

For Letter Groups, 10% of the variance can be explained by Math 
skill and 23% by Sentences. Note that the Letter Groups score is as 
reliable a measure of the sentences factor as is the first CGP Sentences 
subscore. Nevertheless, a considerable amount of variance in the Letter 
Groups scores remains unexplained^ the four factors explaining diily 40% 
of the variance. 

Pictorlally^ the effects of the four factors bri Year 2000 and Letter 
Groups scores can be shown more clearly than in Figure 1 by recourse to 
two other figures excerpted from Figure 1. Figures 3a and 3b show the 
observed scores as functions of f our factors plus uniqueness (^) . 
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It may be concluded from this study that yAtHk Mosaic Comparisons* 
Year 2000, and tetter Groups each have some variance explainable by the 
traditional verbal and mathematical test scores, each has a considerable 
proportion of unique variance ^Y), uncorrilated with reading and math, that 
cannot be identified on the basis of these data. The question of whether 
these unique skills are related to college performance remains to be 
answered. 

In additldn to these conclusions, a number of other interpretations 
based on this model are notable. Because the model fit the data reasonably 
well, this analysis provides some evidence of construct validity for the 
Reading, Sentences, and Math tests in both test batteries. The fact that 
all aubscales loaded on the expected factors supports the hypothesis that 
they are measuring what they were designed to measure. The Reading 
sobscales, for example, all measure Reading rather than, say. Sentences. 
We might have discovered, for example, that CGP Reading subscale 4 loaded 
on the Sentences factor rather than the Reading factor. Or, we might 
have found that the New Jersey tests had a different Reading factor than 
the CGP did. The fact that this did not occur lends support to the 
validity of the Reading subtests. Likewise, the math subtests from both 
batteries loaded on a Math factor rather than on a Reading factor or on 
two different Math factors. The fact that this model fit as hypothesized 
provides evidence of both convergent and discriminant valHity (and 
hence, construct validity) for the traditional subtests of both batteries. 

Another point of interest is that in the analysis of those students 
who took CGP Math test C (i.e., those who had no Algebra course), the 
factor loading for the New Jersey Algebra scores was found to be only 
*i6. This finding is consistent with the presupposition that an Algebra 
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test cannot adeqaateiy measure knowledge of Algebra in a group of students 
who never studied it. For many students it may be measuring how well 
they figure out a problem by some other method or how clever they are. 
The factor ibadings for Algebra reported in this paper are based only 
on data from students taking CGP Math test D or E. 

Also of interest is the finding that the Computation and Algebra 
subtests from both batteries loaded oh a single Math factor i The fact 
that this occurred supports the hypothesis that Algebra is just a harder 
form of Mathematics arid is not a different construct. The finding 
provides support for any attempt that might be made to equate the two. 

Similarly^ it is possible on the basis of this analysis to calibrate 
all subtests loading ori the same factor (Werts, Grandy, & Schabacker, 
198 05- The analysis supports the hypothesis that the CGP Reading 
subtests, for example, are measuring the same cbristruct as the Reading 
subtests of the New Jersey Test. It is justifiable, therefore, to 
calibrate the New Jersey battery to the CGP battery, 
giinnn^-r y bf Maior Flndl3igs _ 

The prliaary purposes of the study vere twofold: (1) to investigate 
whether certain subtests of the CPG (Mosaic Comparisons, Y^ar 2000, and 
Letter Groups) are measuring skills uniquely different from traditional 
verbal .rxd mathematical skill.; (2) to test whether the New Jersey Ba.lc 
Skill, subtests are measuring the same skills a. .l^ilarly named subtests 
of the CGP. A five-factor model (Readtag, Sentences, Mosaic Comparisons, 
computation, ^d Algebra) va. hypothesized and found to fit the data 
from the two test batteries. Those tests having the same names were 
found to be measuring the same skills. It ^s concluded, therefore, that 
both batteries have convergent validity. Two of the factors - Math 

15 
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Computatlon and Algebra were iso highly correlated that they were insep- 
arable. Estimates were, therefore, obtained from a model consisting of 
four factors — Reading, Sentences, Mosaic Comparisons, and Mathematics. 
The skill required to do Mosaic Comparisons was found to correlate 
moderately (r = .37) with Mathematics and less well with the other two 
factors. Scores oh Year 2000 loaded most heavily on the Sentences factor, 
as did the scores on Letter Groups. Letter Groups was^ in fact, found to 
be as reliable a measure of the factor underlying Sentences as the first 
of the Sentences subscores. Oh the other hand, only 40% of the variance 
in Letter Groups could be explained by the four factors. The variance 
in Year 2000 scores was found to be 54% explained by the four factors. 

It was concluded that while verbal and loathematical skills can, to 
some extent, account for a student's performance on Mosaic Comparisons, 
Year 2000, and Letter Groups, each is also measuring something distinctly 
different. They may, therefore, be said to have discriminant validity, 
and hence, construct validity. Whether the unique skills underlying 
these measures are relevant to college performance could not be ascer- 
tained from the existing data. 
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SubJcora 

CGP Reading I 
(Main Idea) 



PGP Reading. I I 
(Secondary Idea) 

CGP Reading III 
(Inferences) 

a:P Reading IV 
(Vocabulary) 

NJ_l#adlng I 
(Main Idea) 

HJ Reading II 
(Direct Stacaaents) 



Kl Reading III 
(Inferences) 

CGP Sehtehces I 
(Id ion and Diction) 

CGP Sentences II _ _ _ 
(Coordtnaeloh and Subordination) 

•n*^*^. ^tt 

(Agreement and Reference) 

CGP_ Sentences IV 
(other) 

NJ^entence Structure I 
(Complete Sentences) 

HJ Sentence Structure II 
(Coordination and Subordination) 

NJ_ Sentence Structure III 
(Placing Modifiers) 



Standardized Factor Loadings and Percentage of 
Variance (In Parentheses) by Each Factor 



Factor— -— — - 

Read Ing S e nt e nce s Math e ma tics M o aalc C o mpa r is ons 

0.62 

(38X) 

0.79 
(62t) 

0^76 
(58Z) 

0.81 
(66t) 

0i76 
(583:) 

0.72 
(52Z) 

0.75 
(56Z) 

0.50 
(25%) 

0.63 
(403:) 

0i67 
(451) 

0.74 
(55t) 

0.75 
(56Z) 

0.76 
(58Z) 

0.74 
(55%) 



Table 1 (cont'd.) 



Standardized Factor Loadings and Percentage of 
Variance (in Parentheses) by Each Factor 



Factor — 



Subscore 

CGP MaEb C 'fflr- ;ta£loh 



CGP Eteiaenr.tiy Algebra 
NJ Math Cbaputation 

NJ Elementary Algebra 



OGP Mosaic Comparisons I 
OGP Mbsaic Comparisons II 

CGP Mosaic Comparisons III 

CGP Year 2000* 



CGP Letter Groups** 



fveaQxng 



0.15 
(21) 

-6.21 
(AZ) 



Sent e n ces^ 



0*35 
(121) 

0.48 
(231) 



0.83 
(69Z) 

6.8A 
(71Z) 

0^75 
(56Z) 

0.72 
(52Z) 



0.14 
(2Z) 

0. 22 
(5Z) 



M& saic Compar ls 



0.78 
(61Z) 

0.9b 
(81Z) 

0.78 
(61Z) 

0;29 
(8Z) 

0.32 
(lOZ) 



*Total variance explained by ail four factors " 54Z. 
**Total variance explained by all four factors " AOZ. 
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Table 2 

Intercorrelations of the Four CGP - New Jersey Basic Skills Factors 



Reading Sentences Mathematics Mosaic Comparisons 
Reading 1.00 

Sentences 0.83 1.00 

Mathematics 0.50 0.48 1.00 

Mosaic Comparisons 0.20 0.26 0.37 1.00 
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•iSMosaic-l 



3^Let-gps ' 




: Elem. 
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N.J. Basic Skills Scores 



Figure i; Conf imatory factor analysis lodel of CGP and N.J. Basic Skills Test scores. • 
(intercbrrelations of factors not shown.) 
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